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ABSTRACT 

The detection and flux estimation of point sources in cosmic microwave background 
(CMB) maps is a very important task in order to clean the maps and also to obtain 
relevant astrophysical information. In this paper we propose a maximum a posteriori 
(MAP) approach detection method in a Bayesian scheme which incorporates prior in- 
formation about the source flux distribution, the locations and the number of sources. 
We apply this method to CMB simulations with the characteristics of the Planck 
satellite channels at 30, 44, 70 and 100 GHz. With a similar level of spurious sources, 
our method yields more complete catalogues than the matched filter with a Scr thresh- 
old. Besides, the new technique allows us to fix the number of detected sources in a 
non-arbitrary way. 

Key words: methods: data analysis - techniques: image processing - radio contin- 
uum: galaxies ~ cosmic microwave background 



1 INTRODUCTION 

The detection and estimation of the intensity of compact 
objects embedded in a background plus instrumental noise 
is a problem of interest in many different areas of sci- 
ence and engineering. A classic example is the detection 
of point-like extragalactic objects -i.e. galaxies- in sub- 
millimetric Astronomy. Regarding this particular field of 
interest, different techniques have proven useful in the lit- 
erature. Some of the existing techn iques are: the standard 
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mat ched filter (MF'. iNailonSflQQ^). the m atched multifil- 
ter (|Herranz et al] |2002| : iLanz et al I [2OI0I') or the recentl y 
developed matched matrix filters ( Herranz fc Sanj|2008l ). 
Other methods include continuous w avelets like the stan- 
dard Mexican JHat (Sanz ct al. 2006) and other members 
of its family (|Gonzalez-NuGvo et al., .2006 ). All these filters 
have been applied to real data of the Cosmic Microwave 
Background (CMB), like those obt ained by the WMAP 
satellite ( L opez-Caniego et al.l |2007| ) and CMB simulated 
data (|Leach et aL 20081) for t he experiment on board the 
Planck satellite ( Taubeill2005l ). Besi des, Bayesian methods 
have also been recently d eveloped (|Hobson fc McLachlanI 
l2003l : [Carvalho et al.ll200^ ). A more detailed review on point 
source detection techniques in microwave and sub-mm As- 
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When a MF or a wavelet is applied to a CMB map in 
the blind detection case, i.e. when it is assumed that the 
number of point sources, their positions and fluxes are un- 
known, the most common method for detection is based on 
the well-known idea of thresholding: the maxima of the fil- 
tered map above a given threshold are selected and consid- 
ered as the positions of the sources, so that the number of 
detected sources is the number of maxima above that thresh- 
old. The fluxes are estimated then by using the correspond- 
ing estimation formulas with the MF or the wavelet. The 
value of this threshold remains arbitrary, though a 5a cut is 
often applied, since it guarantees that under reasonable con- 
ditions a few detected sources are spurious. Apart from the 
arbitrariness of this procedure, the prior knowledge regard- 
ing the average number of sources in the surveyed patch, the 
flux distribution of these sources or other properties are not 
used, so that useful information is being neglected. 

Bayesian detection techniques provide a natural way to 
take into account all the available information about the 
statistical distribution of both the sources and the noise. 
Unfortunately, up to this date only a few works have ad- 
dressed the pro blem of detecting extragalac t ic point sources 
in CM B data l|Hobson fc McLachlanI l2003l : ICarvalho et all 
I2OO9I ). The reason for this is twofold: on the one hand, the 
statistical properties of extragalactic sources at sub-mm fre- 
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quencies are still very poorly known. On the other hand, 
mapping the full posterior probability density of the sources 
is often very difficult and computationally expensive. These 
two problems explain, at least partially, the predominance 
of frequentist over Bayesian methods in the literature. Let 
us consider the previous two problems separately: 

The microwave and sub-mm region has been until very 
recently one of the last uncharted areas in astronomy. Con- 
cerning extragalactic sources, this region of the electromag- 
netic spectrum is where the total number of counts passes 
from being dominated by radio-loud galaxies to being dom- 
inated by dusty galaxies. Although a minimum of the emis- 
sion coming from extragalactic sources is expected to occur 
around 100-300 GHz, they are still considered as the main 
contaminan t of the CMB at small angular scales at these 
frequencies l|Toffolatti et aLlllQgslJde Zotti et al.ll2005l 'l. The 
uncertainties about the number counts at intermediate and 
low flux, redshift distribution, evolution and clustering prop- 
erties of this mixed population of objects are large. In most 
cases this has motivated the use of noninformative priors, 
which avoid to make adventurous assumptions about the 
sources but on the other hand miss part of the power of the 
priors that are based on observations and physical intuition. 

But, in spite of what has been said above, our knowledge 
about the statistical properties of point sources is growing 
day by day thanks to the new generation of surveys and 
experiments. In the high-frequency radio regime, WMAP 
observations are in agreement with the de Z otti model 
ijde Zotti et al.l2005l : [Gonzalez-Nuevo et al.ll2008l ). Priors for 
the number density and flux distributions in the range of fre- 
quencies > 5 GHz are more and more reliable thanks to the 
information provided by recent surveys such as CRATES 
at 8.4 GH z jHealev et all 120071') . the Rvle- Telescop e 9C at 
15.2 GHz (jTavlor et al.l |20Ql': 'Waldram et al."'2003' ) or the 
AT20G survey at 20 GHz j llicci ct al. 2004; Massa rdi et all 
l2008l : iMahonv et all boid l. For a recent review on radio 
an d millimiter surveys a nd their astrophysical implications, 
see Ide Zotti et al.l (|201G| ) . The situation is worse in the far- 
infrared part of the spectrum, where relatively large uncer- 
tainties remain in the statistical properties, the evolution 
and, above all, the clustering properties of dusty galaxies. 
Most of the existing dusty galaxy surveys have been car- 
ried out in the near and medium infrared with IRAS, ISO 
and Spitzer, but the wave-band from 60 to 500 /xm is still 
virtually terra incognita. The only survey of a large area of 
the extragalactic sky at a wavelength above 200 /im is the 
one recently carried out by the Herschel pathfinder exper- 
iment, the Balloo n Large Area Survey Telescope (BLAST, 
iDevlin et ai] |2009''). In the next few months, however, the lu- 
minosity function and the dust-mass function of dusty galax- 
ies in the nearby Universe will be much better understoo d 
thanks to the Herschel- ATLAS Survey (|Eales et al.ll2010h . 
which covers the wavelength range between 110 and 500 /im 
and has already produced interesting results during the Her - 
schel Science Demonstration Phase (|Clements et al.ll201Cf ). 
Thanks to these and the previously mentioned observations, 
the sub-mm gap is narrowing and our knowledge of galaxy 
populations in this wave band, albeit far from perfect, is 
quickly improving. 

Apart from the uncertainties on the priors, the other 
complication that has traditionally deterred microwave as- 
tronomers from attempting Bayesian point source detection 



is computational and algorithmic complexity. Depending on 
the choice of priors and the likelihood function, the full pos- 
terior distribution of the parameters of the sources may be 
very complex and in most cases it is impossible to obtain 
maximum a posteriori (MAP) values of the parameters and 
their associated errors via analytical equations. Numerical 
sampling techniques such as Monte Carlo Markov Chain 
(MCMC) methods are required in order to solve the infer- 
ence problem, but these methods are computationally in- 
tensive. It is thus necessary to apply computing techniques 
specifically tailored for accelerating the convergence and 
impr oving the efficiency of the sampling (jPeroz fc Hobsoi] 
boOSh and/or to find sma rt approximations of the poste- 
rior near its local maxima l|Carvalho et al.ll2009l '). But these 
enhancements have the cost of increasing dramatically the 
algorithmic complexity of the detection software, introduc- 
ing new layers of intricacy in the form not only of addi- 
tional assumptions and routines, but also of regularization 
'constants', hidden variables, hyperparameters and selection 
thresholds that in many cases must be fine-tuned manually 
in order to be adapted to the specific circumstances of a 
given data set. The complexity of the algorithms can rise to 
almost baroque levels, having a negative effect on the porta- 
bility of the codes and on the reproducibility of the results. 

We propose in this paper a simple strategy based on 
Bayesian methodology which incorporates sensible prior in- 
formation about the source locations, the source fluxes and 
the source number distribution. With these priors and as- 
suming a Gaussian likelihood, we can obtain an explicit form 
of the negative log-posterior of the number of sources and 
their fluxes and positions. Assuming a MAP methodology, 
we introduce a straightforward top-to-bottom detection al- 
gorithm that allows us to determine the number, fluxes and 
positions of the sources. We give a simple proof that the po- 
sitions of the sources must be located in the local maxima of 
the matched-filtered image if there is not a significant over- 
lap between sources. The main computational requirement 
of our algorithm is the solution of a system of non-linear 
equations . Our method differs from the one presented by 
^Carvalho et al.l (|2009l ') in five main points: 

(i) We use a more realistic set of priors for the source 
number, intensity and location distributions. In particular, 
our choice of the prior on the locations is also flat but de- 
pends on the number of sources n, which later proves to be 
decisive for the log-posterior. 

(ii) We obtain an explicit form of the negative log- 
posterior and an explicit solution of the MAP estimate of 
the source intensities as the solution of a non-linear system 
of equations. 

(iii) We prove that, for non-overlapping sources and a 
Gaussian likelihood, the MAP estimation of the positions 
of the sources is given by the location of the local maxima 
of the matched filtered images. 

(iv) Since we are interested only in point sources, we fix 
the size parameter of the objects to be detected. 

(v) We can also find the MAP solution for the number of 
sources present in the images with a simple top-to-bottom 
search strategy. We do not need to resort to costly evalua- 
tions of the Bayesian evidence. 

The layout of the paper is as follows: in section [2] we 
present the method and derive the corresponding posterior 
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which includes the data hkelihood and the priors. In sec- 
tion [3] we apply the method to CMB simulations with the 
characteristics of the radio Planck channels (from 30 to 100 
GHz, where the number count priors are most reliable) and 
compare it with the standard procedure of using a MF with 
a 5(7 threshold. The main results are also presented in sec- 
tion [S] The conclusions are given in the final section. 



2 METHODOLOGY 

In a region of the celestial sphere, we suppose to have an 
unknown number n of radio sources that can be consid- 
ered as point-like objects if compared to the angular reso- 
lution of our instruments. This means that their actual size 
is smaller than our smallest resolution cell. The emission of 
these sources is superimposed to a radiation f{x, y) coming 
from diffuse or extended sources. In our particular case this 
radiation is the CMB plus foreground radiation. A model 
for the emission as a function of the position [x, y) is 



d{x, y) = f{x, + ^ aa5{x - Xa,y -y^), 



(1) 



where S{x,y) is the 2D Dirac delta function, the pairs 
{xa, ya) are the locations of the point sources in our region 
of the celestial sphere, and are their fluxes. We observe 
this radiation through an instrument, with beam pattern 
b{x, y), and a sensor that adds a random noise n{x, y) to the 
signal measured. Again, as a function of the position, the 
output of our instrument is: 

n 

d{x,y) = ^ac.b{x-xc„y-ya) + {.f*b){x,y)+n{x,y), (2) 

a = l 

where the point sources and the diffuse radiation have been 
convolved with the beam. In our application, we are inter- 
ested in extracting the locations and the fluxes of the point 
sources. We thus assume that the fluxes of the point sources 
are sufHciently above the level of the rest of the signal plus 
the noise, and consider the latter as just a disturbance su- 
perimposed to the useful signal. If e{x,y) is the sum of the 
diffuse signal plus the noise, model ((2]) becomes 



d{x,y) 



E 



ac,h{x - Xa,y - ya) + >^{x, y). 



(3) 



If our data set is a discrete map of TV pixels, the above 
equation can easily be rewritten in vector form, by letting 
d be the lexicographically ordered version of the discrete 
map d{x, J/), a be the n- vector containing the positive source 
fluxes aa, e the lexicographically ordered version of the dis- 
crete map, e{x, y), and be an A*' x n matrix whose columns 
are the lexicographically ordered versions of n replicas of 
the map b{x, y), each shifted on one of the source locations. 
Equation ([2} thus becomes 



(f>a 



(4) 



Looking at equations ^ and Q, we see that, if the 
goal is to find locations and fluxes of the point sources, our 
unknowns are the number n, the list of locations {xa,ya), 
with a — l,...,n and the vector a. It is apparent that, 
once n and {xa,ya) are known, matrix </> is perfectly de- 
termined. Let us then denote the list of source locations by 



the n X 2 matrix R, containing all their coordinates. If we 
want to adopt a Bayesian strategy to solve our problem, we 
must be able to write the posterior probability density of 
our unknowns. A suitable estimation criterion must then be 
chosen. 



2.1 Posterior 

By the Bayes rule, the posterior we are looking for has the 
following form 

p(n, R, ajd) oc p(d|n, R, a)p(n, R, a) (5) 

where p(d|n, R, a) is the likelihood function, derived from 
our data model (U). To find the prior density p{n, R, a) we 
need to make a number of assumptions. Let us first observe 
that, in principle, both R and a depend on n, through the 
number of their elements. On the other hand, we can safely 
assume that, once n is fixed, the fluxes a of the sources are 
independent of their locations. These assumptions lead us 
to write 

p{n, R, a) = p(R, a.\n)p{n) = p{Ti\n)p{a.\n)p{n) (6) 

This expression is valid when we consider extragalactic point 
sources, whose fluxes are not related to their positions. This 
will be the case in this paper. 

2.2 Likelihood function 

As mentioned above, the likelihood function derives from the 
physics associated to the assumed data model. In general, 
unfortunately, a data model of type ^ is difficult to describe 
statistically. We are going to assume from now on that e is a 
random Gaussian field with zero mean and known covariance 
matrix 5. This is true if we only consider the CMB and the 
instrumental noise, excluding other foregrounds. In this pa- 
per we will deal with zones of the sky where the foreground 
contribution is not important or where the foregrounds have 
been conveniently removed by component separation tech- 
niques. The likelihood is thus 

(d-(?!)a)*^-^(d-(^8 



p(d|n, R, a) oc exp 



(7) 



Observe that the negative of the exponent in ([7| is in any 
case the squared ^~^-norm fit of the reconstructed data to 
the measurements, and this always carries information about 
the goodness of our estimate. However, if the Gaussian as- 
sumption is not verified, function ((7| is not the likelihood 
of our parameters, and when it is introduced in (O, we do 
not obtain the posterior distribution we are looking for. In 
our simulations we will also include the confusion noise due 
to faint extragalactic sources. This confusion noise is not 
Gaussian but as we will see later, it does not hamper the 
detections, since its standard deviation is much lower than 
that of the CMB plus the instrumental noise for the fre- 
quencies considered in this paper. For the sake of simplicity, 
we will defer to further papers the treatment of the more 
general case which includes the foregrounds. 

2.3 Prior on source locations 

A priori, it is reasonable to assume that all the different 
combinations of n distinct locations occur with the same 
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probability. Then function p(R|n) in Eq. (|6} can be written 
as 



p(R|n) = 



n\{N -n)\ 
iV! ' 



(8) 



since A'^!/(7i!(7V — n)!) is the number of possible distinct lists 
of n locations in a discrete A''-pixel map. This assumption is 
based on the fact that the sources considered in this paper 
are spatially uncorrelated. 



2.4 Prior on fluxes 

Experimentally, it has been found that the fluxes of the 
strongest sources are roughly distributed as a negative power 
law, with exponent 7. Conversely, the weak sources have 
fluxes that are roughly uniformly distributed. To include 
these two behaviors into a single formula, one should first 
discriminate in some way between weak and strong sources. 
This can be done empirically, by establishing a sort of 
threshold ao on the fluxes and a conditional prior with th e 
form of the Generalized Cauchy Distribution l|Rideij| 19571 ): 



p(a|n)ocn[l+(S)^' 



(9) 



with p a positive number. Distribution ([9]) obviously as- 
sumes that the fluxes of the different sources are mutually 
independent. This prior clearly shows the behavior required 
for strong and weak sources. In order to work with non- 
dimensional quantities, we define Xa = aa/ao; we also as- 
sume that we will detect point sources above a minimum 
flux am, that leads to the following normalized distribution 



p(x|n) = Yi 



R { 1 7-1 1 



(10) 

where B is the incomplete beta function and Xm = flm/ao- 
In the next section, we determine the values of ao, p and 7, 
by fitting this formula to the p oint source distribut ion given 
by the de Zotti counts model (|de Zotti et al.ll2005h . 



2.5 Prior on the number of sources 

We need to establish a discrete probability distribution 
that expresses the probability of a number of occurrences 
in a fixed domain once their average density is known, 
their locations in the domain are mutually independent, 
and no pair of sources can occur in the same location. 
All these assumptions seem reasonable when applied to 
the configurations of the point sou rces in the celestial 
sphere, at least for radio-fr equencies (|Argiieso et al.l [20031 : 
iGonzalez-Nuevo et al.|[2005l ). Assuming a continuous map 
domain, all the requirements mentioned are satisfied by the 
Poisson distribution. Strictly speaking, we have a discrete 
A^-pixel map, so a binomial distribution should be used, but 
if A'' is not too small, a Poisson distribution should model 
correctly the probability of having n occurrences of point 
sources. The prior on n appearing in (|6} is thus 



p{n) = 



A e 



where A, the intensity of the Poisson variable, is the expected 
number of sources in the map at hand. The value of A will 
depend on the flux detection limit am, the size of the map 
and the wavelength of the observation. 

2.6 An explicit expression for the negative 
log-posterior 

If we multiply all the factors which appear in ((SJ and © 
and calculate the negative log-posterior, we find (apart from 
additive constants) 



L(n,R,x) = -(x*Mx-2e*x) -log(Ar-n)!-nlog(A) 

1 7-11' 



nlog(p) + nlog-B 



+ 2 Vlog(l-f:rS), 
p ^ — ' 



1 + p ' p 



(12) 



with M = aa4>^^^^4> and e = ao</)*$"^d. The correlation 
matrix ^ is computed b y using the Ci's o btained from the 
WMAP five-year maps (|Noha et al.ll2009l 'l and adding the 
instrumental noise. We assume that we know ao, p, Xi, 7 
and A, i n fact we calculate th em by using the de Zotti counts 
model (|de Zotti et al.1 120051 ). Therefore, the unknowns are: 
the normalized fluxes x, the number of point sources n and 
the positions of the point sources through the matrix (/>(R). 

Let us now examine the structure of function (|12p . The 
first term comes from the likelihood, and obviously decreases 
as much as our solution fits the data. The second term takes 
into account the prior for the source configuration and pe- 
nalizes large values of n. The third term comes from the 
prior on the number of sources and, depending on the value 
of A, favors (A > 1) or disfavors (A < 1) the increase of the 
number of sources. The following terms come from the prior 
on the source fluxes conditioned to n, the last one introduces 
an additional cost as soon as a new source is added to the 
solution. If A'^ ^ A and A S> n, what is typical in CMB 
maps, it can be proven by using Stirling's approximation 
that the second term is the dominant one coming from the 
priors. In the next section, we will analyze with simulations 
the contribution of each particular term. 



2.7 Maximum a posteriori (MAP) solution 

Formula (|12p includes all the information about the po- 
sitions, fluxes and number of sources. In order to obtain 
concrete results, we will choose the values of R, x and n 
which maximize the posterior. This choice will be justified 
by means of the simulations and results that we will present 
in the next section. 

Therefore, regarding the flux we minimize (|12p with re- 
spect to x, by taking the derivative and equating to zero 
and we obtain 



+ 



/3=i 



l + x£ 



= 0. 



(13) 



(11) 



By solving (|13p numerically we would obtain the estimator of 
X which yields the maximum posterior probability. However, 
we know neither the number of sources nor their positions. In 
order to determine the positions, we assume that the point 
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sources are in the local maxima of e, which is the matched- 
filtered map of the original data. In the following, we will 
show that this assumption can be safely adopted, since the 
minima of L must be in the maxima of the matched-filtered 
map (we also remark that the matched filter is not intro- 
duced ad hoc, but it appears naturally as a part of the for- 
malism) . 

For simplicity, let us assume that we have only one 
source, in this case the terms of L which depend on the 
flux can be written 

L{x) = -e^x + l log (i + x") , (14) 

2 P 

where Mn and ei are the corresponding values of M and e 
at the pixel supposedly occupied by the point source. If we 
take the derivative of (|14p with respect to x and equate to 
zero, we obtain the following equation for x, the estimator 
of x: 



Miix + 



JX' 



p-i 



1 + xP 



ei 



: x{ei). 



(15) 



-2^ 
-1 



- De Zotti counts 
-Fit 



log10(Flux Jy) 



Figure 1. logj^Q of the differential counts plotted against the 
flux for the de Zotti model (green line) and the fit to the extended 
power- law given by ||9ll with the parameters of Table[T](blue line). 
The two lines are nearly indistinguishable. 



If we substitute the last expression in eq. (|14p . we can write 
the expression for the negative log-posterior L(£(ei)) 



L(i(ei)) = 



Miix^ 



1+XP 



^ log (1 -h F) . (16) 



By taking the derivative of this formula with respect to ei, 
we finally find 



dL dL / dei \ ' 
dei dx \ dx J 



-x{ei) 



(17) 



where we have calculated the derivative in (|15p . Since the 
estimated value of the fiux must be positive, this expres- 
sion shows that the negative log-posterior at the estimated 
value of X decreases with ei, so it is minimum at the highest 
value of ei i.e. at the maximum of the matched-filtered map. 
Therefore, the posterior, calculated at the estimated flux 
value, is maximum when we assume that the point source is 
at a peak of the map. This conclusion is valid if we have more 
than one source, provided that there is no overlap between 
sources, i.e. the areas where the individual images of all the 
sources are nonzero must be completely disjoint, because in 
this case each source can be treated individually. 

In order to determine the number of sources, we sort 
these local peaks from top to bottom and solve p3[) suc- 
cessively adding a new source. At the same time, we calcu- 
late the negative log-posterior (|12[) and choose the number 
n of sources which produce the minimum value of (|12p . In 
this way we have constructed an objective stopping crite- 
rion which yields, by combining (|12p and (|13|) . the number 
of sources and their fluxes which maximize the posterior. In 
the next section, we apply the method explained above to 
the detection and flux estimation of point sources in CMB 
maps. 

In order to compare this technique with a standard 
method, we also calculate the local peaks above a certain 
threshold, for instance a 5a threshold, and solve p3p with 
7 = 0, that amounts to using a MF, i.e. a maximum likeli- 
hood estimator. 



3 SIMULATIONS AND RESULTS 
3.1 The simulations 

In order to check the performance of the new technique, we 
have carried out simulations including CMB, instrumental 
noise and point sources. The simulations have the character- 
istics of the 30, 44, 70 and 100 GHz channels of the Planck 
satellite: pixel size, beam width and instrumental nois^H- 
The simulations are flat patches of 32 x 32 pixels (30 and 
44 GHz) and 64 x 64 pixels (70 and 100 GHz), so that the 
size of each patch is 3.66 x 3.66 square degrees. In order to 
avoid border effects, we simulate patches of four times this 
size and keep the central part for our analysis. The small 
size of the simulations allows us to do our calculations in a 
fast way. We perform 1000 simulations for each channel. 

The CMB maps have been generated by using the power 
spectrum, the C^'s, that produces the best fit to the WMAP 
five-year maps (|Nolta et al.ll2009l ). we have also added the 
instrumental noise of the 30, 44, 70 and 100 GHz chan- 
nels of the Planck satellite. Finally, we have simulated point 
sources, by taking into acco unt the flux distrib ution pre- 
dicted by the de Zotti model (|de Zotti et al.ll2005h . We have 
included the faint tail of the de Zotti distribution, simulating 
point sources from 0.01 mjy on. In this way, we have con- 
sidered the confusion noise due to unresolved point sources. 
The standard deviation of this confusion noise is much lower 
than that of the CMB plus instrumental noise in these chan- 
nels. 

For each simulation we consider the negative log- 
posterior given by (|12|l . In this equation we see several mag- 
nitudes, 7, ao, a-m and A, which depend on the frequency. 
By fitting the number counts given in the de Zotti model 
by (|9]), we calculate 7 and oq. We have taken p—1 in (9), 
since the goodness-of-flt obtained by changing p is not bet- 
ter than that of the particular case p—1. The parameter 



^ For details on the Planck instrumental 
scientific performance, see the Planck web 



and 
site 



|http://www.rssd.esa.int/index.php?project=PLANCK[ 
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frequency 


7 


o-o 


A 


30 GHz 


2.90 


0.19 


0.69 


44 GHz 


2.87 


0.15 


0.53 


70 GHz 


2.87 


0.15 


0.49 


100 GHz 


2.87 


0.15 


0.47 



Table 1. Values of the power-law exponent 7 and the flux ao in 
Jy, as obtained by fitting the De Zotti counts. A is the average 
number of point sources above 0.25 Jy in the considered patches. 



A is the average number of sources per patch and am is the 
minimum flux that we consider in our detection scheme, we 
have chosen am = 0.25 Jy, the typical rms deviation of the 
CMB plus noise maps at the frequencies we consider. The 
values of 7, ao and A are shown in Table [T] for the different 
frequencies. In Figure [T] we show as an example the fit to 
our extended power-law ([9| in the case of the 30 GHz chan- 
nel. It is clear that the extended power-law fits very well the 
counts predicted by the de Zotti model. The value of is 
(2 — 3) X 10^'^ giving probabilities very close to 1. 



3.2 Discussion on the performance of the 
algorithm 

For each simulation we calculate M — a^tf/^^^tf) and e — 
ao(/)*^"M. We obtain ^"^ from the WMAP Ci's, taking into 
account the effects of the pixel and the beam windows and 
the corresponding noise levels for each channel. We calcu- 
late the estimated fluxes Xa by solving (|13|l as explained 
in the previous section: we select the maxima of e above 
a certain threshold (we choose a la threshold so that we 
have a suitable number of peaks) and we perform a top to 
bottom strategy, i.e. we sort the local maxima downwards 
from higher to lower values and starting from the highest 
peak we solve (|13[) including in each new iteration a new lo- 
cal maximum. At the same time, we calculate (|12p and stop 
the iterations when we find the minimum value of the neg- 
ative log-posterior. In this way, we obtain the source fiuxes 
and the number of sources that maximize the log-posterior. 
We also calculate the local peaks of e above a 5a thresh- 
old, a standard detection method, and calculate the source 
flux by solving (|13p with 7 = 0, this is equivalent to us- 
ing a MF with a 5a threshold. Our intention is to compare 
the Bayesian method (BM), with prior information and a 
natural stopping criterion, and the standard MF. 

According to our simulations, the fundamental contri- 
butions to the posterior come from the likelihood and 
the prior on source locations ((S)). The other terms also con- 
tribute, but as can be seen in Figure [2j where we show the 
negative log-posterior plotted against the number of sources 
for a particular simulation, their influence is not so impor- 
tant. The likelihood tends to increase the number of detected 
sources, over-fitting the data and the prior ((HJ tends to de- 
crease the number of sources. The combination of ([7| and 
([8| fixes the most probable number of point sources, though 
the other two terms, although less important can have some 
infiuence. This shows the robustness of the method with re- 
spect to small changes in the parameters of priors (10) and 
(11). 

We also raise the question whether the estimated num- 
ber of point sources gives us a clearly higher posterior prob- 
ability, i.e oc exp(— log L) than other close numbers. The 
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Figure 2. Negative log-posterior against the number of detected 
sources for a simulation at 30 GHz. We have included in the 
posterior; all the priors (blue solid line), all the priors but the 
source flux distribution (red dash-dotted line), all the priors but 
the Poisson source number distribution (cyan dashed line) and 
finally we have excluded both the Poisson and the source flux 
distribution (green dotted line). 
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Figure 3. Posterior probability against number of detected point 
sources for a simulation at the 30 GHz channel with one real 



answer can be seen in Figure |3l where we show, as an ex- 
ample, the normalized posterior probability plotted against 
the number n of point sources for a given simulation with 
one real source. The probability is clearly peaked at the esti- 
mated number of sources, which is the real number of point 
sources in this case. In our simulations we observe that the 
posterior probability is always strongly peaked around the 
estimated number of sources. 

We analyze 1000 simulations for each of the considered 
Planck channels: we calculate the contamination (the num- 
ber of detected spurious sources over the total number of 
detected sources above a given flux), the completeness (the 
number of real detected sources over the number of simu- 
lated sources above a given flux) and the average of the ab- 
solute value of the relative error of the estimated flux with 
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Figure 4. Contamination plotted against the flux for the BM 
and the MF (100 GHz). 



Figure 5. Completeness plotted against the flux for the BM and 
the MF (100 GHz). 



respect to the real flux (reconstruction error). We count a 
detected source as real when there is a real simulated source 
at a distance no longer than two pixels from the detected 
one, this distance is the position error. This real source must 
have a flux equal or higher than 0.20 Jy, to fix a threshold 
close to the la level of the CMB plus noise map. The same 
conditions are required for the MF. 

In order to give the uncertainty in the flux, derived from 
our Bayesian approach, we will obtain a 95% confidence in- 
terval associated to our probability distribution 



P{x) oc exp(— L(a;)), 



(18) 



where L{x) is given by eq. 1141 This will be called the esti- 
mation error. We can also calculate the expectation value of 
the flux from this distribution, this value can be compared 
with the most probable value (i.e. our estimated flux). 
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3.3 Results 

Taking into account the considerations above, we have ap- 
plied our algorithm to the simulations described in 13. H and 
obtained the following results. 

At the 30 GHz channel we have 4.9% contamination 
above 0.2 Jy with the BM, and 0.7% with the MF. However, 
the completeness is much better for the BM, 64%, than for 
the MF, 22%. From 0.7 Jy on we do not have any spurious 
source (BM) and the completeness is 99%. For the MF there 
are no spurious sources from 0.25 Jy on, but the complete- 
ness at 0.7 Jy is only 75%. In regard to the average value 
of the absolute value of the relative error (reconstruction 
error), when we calculate this error in flux intervals of 0.1 
Jy, we obtain similar values for the BM and the MF. For 
instance, we obtain errors below 15% from 0.6 Jy on and 
below 10% from 1 Jy on for both methods. Only 14% of the 
sources have a reconstruction error on the position of 1 pixel 
and only 3% have a higher error. These results are nearly 
the same for the BM and the MF. 

At the 44 GHz channel we have 6.5% contamination for 
fluxes higher than 0.2 Jy with the BM, and 4% with the 
MF. The completeness is 37% for the BM and 11% for the 
MF. From 0.8 Jy on we do not have any spurious source 



Figure 6. Completeness plotted against contamination for the 
BM and the MF (100 GHz). 



(BM) and the completeness is 100%. For the MF there are 
no spurious sources from 0.30 Jy on, but the completeness 
at 0.8 Jy is only 70%. As in the 30 GHz case, we obtain 
similar average values of the absolute value of the relative 
error for both methods. For instance, we obtain errors below 
15% from 0.90 Jy on in both cases. 15% of the sources have 
a reconstruction error on the position of 1 pixel and only 
3% have a higher error. These results are nearly the same 
for the BM and the MF. 

At the 70 GHz channel we have 3.2% of spurious sources 
for fluxes higher than 0.2 Jy with the BM, and 1% with the 
MF. The completeness is 45% for the BM and 19% for the 
MF. From 0.45 Jy on we do not have any spurious source 
(BM) and the completeness is 96%. For the MF there are 
no spurious sources from 0.30 Jy on, but the completeness 
at 0.45 Jy is only 46%. As in the cases above, we obtain 
similar average values of the absolute value of the relative 
error for both methods. For instance, we obtain errors below 
15% from 0.60 Jy on in both cases. 9% of the sources have 
a reconstruction error on the position of 1 pixel and only 
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Figure 7. Average value of the absolute value of the relative error 
plotted against the flux for the BM and the MF (100 GHz). We 
can see in the plot the low values of the error for both methods. 



Figure 8. Estimated flux against real flux for the BM (100 GHz). 
We have plotted the straight line y = x for comparison. 



1% have a higher error. These results are similar for the BM 
and the MF. 

At the 100 GHz channel we have 4.7% of spurious 
sources for fluxes higher than 0.2 Jy with the BM, and 7.5% 
with the MF. The completeness is 89% for the BM and 32% 
for the MF. From 0.3 Jy on we do not have any spurious 
source (BM) and the completeness is 100%. For the MF 
there are no spurious sources from 0.75 Jy on and the com- 
pleteness at 0.75 Jy is 94%. As in the cases above, we obtain 
similar average values of the absolute value of the relative 
error for both methods. For instance, we obtain errors be- 
low 10% from 0.5 Jy in both cases. 1% of the sources have a 
reconstruction error on the position of 1 pixel and there are 
no sources with a higher error. These results are similar for 
the BM and the MF. 

In order to visualize these results, we have plotted for 
the 100 GHz channel the contamination (integrated con- 
tamination) against the flux in Figure [J] the completeness 
(integrated completeness) against the flux in Figure O the 
completeness against the contamination in Figure [6l the av- 
erage value of the absolute value of the relative error in 
Figure [7] and finally, the estimated flux against the real flux 
in Figure IHl It is clear that for a given value of the contam- 
ination the completeness is higher for the BM than for the 
MF. Although the results are similar at all the studied fre- 
quencies, we have chosen the 100 GHz channel in order not 
to complicate unnecessarily the figures. 

In Figure [9] we plot the expectation value of the flux 
against the estimated flux for the 100 GHz channel. The 
expectation value is nearly the same as the most probable 
value. We also plot the 95% confidence intervals. In this 
way, we have an idea of the uncertainty of our estimates, 
this confidence interval is ~ 0.20 Jy (estimation error). The 
results at other frequencies are similar. 



4 CONCLUSIONS 

In this paper we propose a new strategy based on Bayesian 
methodology (BM), that can be applied to the blind detec- 
tion of point sources in CMB maps. The method incorpo- 



rates three prior distributions: a uniform distribution (|8]) on 
the source locations, an extended power law on the source 
fluxes (|10p and a Poisson distribution on the number of point 
sources per patch Together with a Gaussian likelihood, 
these priors produce the negative log-posterior (|12|l . 

We minimize this negative log-posterior with respect 
to the source fluxes in order to estimate them. At the same 
time, we show that the detected sources must be in the peaks 
of the matched-filtered maps. Finally, we choose the num- 
ber of point sources which minimizes p2|) for the estimated 
fluxes. 

In this way, we give a non-arbitrary method to select the 
number of point sources. Finally, to check the performance 
of this technique, we carry out flat CMB simulations for the 
Planck channels from 30 to 100 GHz. For simplicity, we have 
excluded the foregrounds in our simulations, assuming that 
we are considering zones of the sky which have been cleaned 
by the application of component separation methods. How- 
ever, we have included the confusion noise due to unresolved 
point sources in our simulations. 

We compare our Bayesian strategy with the application 
of a matched filter with a standard 5cr threshold. We calcu- 
late the contamination, the completeness and the relative 
error for both methods. Though the percentage of spurious 
sources is a little higher for the BM at low fluxes ~ 0.2 — 0.3 
Jy, the completeness is much better, allowing us to obtain 
catalogues with a 99% completeness and no spurious sources 
from 0.7 Jy (30 GHz), 0.8 Jy (44 GHz), 0.55 Jy (70 GHz) 
and 0.3 Jy (100 GHz) on. The reconstruction errors in the 
estimated fluxes are similarly low for both methods. 
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